Proposal for an Algorithm to Improve a Rational Policy in POMDPs

نویسندگان

Kazuteru MIYAZAKI

Shigenobu KOBAYASHI

چکیده

Reinforcement learning is a kind of machine learning. Partially Observable Markov Decision Process (POMDP) is a representative class of non-Markovian environments in reinforcemnet learning. We know the Rational Policy Making algorithm (RPM) to learn a deterministic rational policy in POMDPs. Though RPM can learn a policy very quickly, it needs numerous trials to improve the policy. Furthermore RPM does not apply the class where there is no deterministic rational policy. In this paper, we propose the Rational Policy Improvement algorithm (RPI) that combines RPM and the mark transit algorithm with χgoodness-of-fit test. RPI can learn a deterministic or stochastic rational policy in POMDPs. RPI is applied to maze environments. We show that RPI can learn the most stable rational policy in comparison with other methods.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Solving POMDPs by Searching in Policy Space

Most algorithms for solving POMDPs itera tively improve a value function that implic itly represents a policy and are said to search in value function space. This paper presents an approach to solving POMDPs that repre sents a policy explicitly as a finite-state con troller and iteratively improves the controller by search in policy space. Two related al gorithms illustrate this approach. ...

متن کامل

Online Policy Improvement in Large POMDPs via an Error Minimization Search

Partially Observable Markov Decision Processes (POMDPs) provide a rich mathematical framework for planning under uncertainty. However, most real world systems are modelled by huge POMDPs that cannot be solved due to their high complexity. To palliate to this difficulty, we propose combining existing offline approaches with an online search process, called AEMS, that can improve locally an appro...

متن کامل

Public Spending on Health Service and Policy Research in Canada, the United Kingdom, and the United States: A Modest Proposal

Health services and policy research (HSPR) represent a multidisciplinary field which integrates knowledge from health economics, health policy, health technology assessment, epidemiology, political science among other fields, to evaluate decisions in health service delivery. Health service decisions are informed by evidence at the clinical, organizational, and policy level, levels with distinct...

متن کامل

Optimal and Approximate Q-value Functions for Decentralized POMDPs

Decision-theoretic planning is a popular approach to sequential decision making problems, because it treats uncertainty in sensing and acting in a principled way. In single-agent frameworks like MDPs and POMDPs, planning can be carried out by resorting to Q-value functions: an optimal Q-value function Q is computed in a recursive manner by dynamic programming, and then an optimal policy is extr...

متن کامل

The Cross-Entropy Method for Policy Search in Decentralized POMDPs

Decentralized POMDPs (Dec-POMDPs) are becoming increasingly popular as models for multiagent planning under uncertainty, but solving a Dec-POMDP exactly is known to be an intractable combinatorial optimization problem. In this paper we apply the Cross-Entropy (CE) method, a recently introduced method for combinatorial optimization, to Dec-POMDPs, resulting in a randomized (sampling-based) algor...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 1999

Proposal for an Algorithm to Improve a Rational Policy in POMDPs

نویسندگان

چکیده

منابع مشابه

Solving POMDPs by Searching in Policy Space

Online Policy Improvement in Large POMDPs via an Error Minimization Search

Public Spending on Health Service and Policy Research in Canada, the United Kingdom, and the United States: A Modest Proposal

Optimal and Approximate Q-value Functions for Decentralized POMDPs

The Cross-Entropy Method for Policy Search in Decentralized POMDPs

عنوان ژورنال:

اشتراک گذاری